Dataset statistics
| Dataset A | Dataset B | |
|---|---|---|
| Number of variables | 12 | 12 |
| Number of observations | 446 | 446 |
| Missing cells | 431 | 423 |
| Missing cells (%) | 8.1% | 7.9% |
| Duplicate rows | 0 | 0 |
| Duplicate rows (%) | 0.0% | 0.0% |
| Total size in memory | 45.3 KiB | 45.3 KiB |
| Average record size in memory | 104.0 B | 104.0 B |
Variable types
| Dataset A | Dataset B | |
|---|---|---|
| Numeric | 5 | 5 |
| Categorical | 7 | 7 |
| Dataset A | Dataset B | |
|---|---|---|
Name has a high cardinality: 446 distinct values | Name has a high cardinality: 446 distinct values | High Cardinality |
Ticket has a high cardinality: 375 distinct values | Ticket has a high cardinality: 375 distinct values | High Cardinality |
Cabin has a high cardinality: 82 distinct values | Cabin has a high cardinality: 93 distinct values | High Cardinality |
Survived is highly overall correlated with Sex | Survived is highly overall correlated with Sex | High Correlation |
Sex is highly overall correlated with Survived | Sex is highly overall correlated with Survived | High Correlation |
Age has 84 (18.8%) missing values | Age has 89 (20.0%) missing values | Missing |
Cabin has 347 (77.8%) missing values | Cabin has 334 (74.9%) missing values | Missing |
Name is uniformly distributed | Name is uniformly distributed | Uniform |
Ticket is uniformly distributed | Ticket is uniformly distributed | Uniform |
Cabin is uniformly distributed | Cabin is uniformly distributed | Uniform |
PassengerId has unique values | PassengerId has unique values | Unique |
Name has unique values | Name has unique values | Unique |
SibSp has 307 (68.8%) zeros | SibSp has 289 (64.8%) zeros | Zeros |
Parch has 343 (76.9%) zeros | Parch has 326 (73.1%) zeros | Zeros |
Fare has 9 (2.0%) zeros | Fare has 10 (2.2%) zeros | Zeros |
Reproduction
| Dataset A | Dataset B | |
|---|---|---|
| Analysis started | 2023-03-08 14:20:12.021364 | 2023-03-08 14:20:18.887236 |
| Analysis finished | 2023-03-08 14:20:18.883505 | 2023-03-08 14:20:23.675310 |
| Duration | 6.86 seconds | 4.79 seconds |
| Software version | ydata-profiling v0.0.dev0 | ydata-profiling v0.0.dev0 |
| Download configuration | config.json | config.json |
PassengerId
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 446 | 446 |
| Distinct (%) | 100.0% | 100.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 456.97982 | 429.06278 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 1 | 2 |
| Maximum | 889 | 890 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 1 | 2 |
| 5-th percentile | 47.25 | 39.75 |
| Q1 | 251.5 | 213.25 |
| median | 457 | 423.5 |
| Q3 | 668.25 | 637.75 |
| 95-th percentile | 850.25 | 844.5 |
| Maximum | 889 | 890 |
| Range | 888 | 888 |
| Interquartile range (IQR) | 416.75 | 424.5 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 251.33269 | 254.03925 |
| Coefficient of variation (CV) | 0.54998642 | 0.59207945 |
| Kurtosis | -1.1117045 | -1.128275 |
| Mean | 456.97982 | 429.06278 |
| Median Absolute Deviation (MAD) | 208.5 | 212.5 |
| Skewness | -0.034598277 | 0.08687862 |
| Sum | 203813 | 191362 |
| Variance | 63168.123 | 64535.942 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 165 | 1 | 0.2% |
| 573 | 1 | 0.2% |
| 205 | 1 | 0.2% |
| 229 | 1 | 0.2% |
| 377 | 1 | 0.2% |
| 183 | 1 | 0.2% |
| 273 | 1 | 0.2% |
| 325 | 1 | 0.2% |
| 438 | 1 | 0.2% |
| 602 | 1 | 0.2% |
| Other values (436) | 436 |
| Value | Count | Frequency (%) |
| 66 | 1 | 0.2% |
| 228 | 1 | 0.2% |
| 884 | 1 | 0.2% |
| 42 | 1 | 0.2% |
| 695 | 1 | 0.2% |
| 126 | 1 | 0.2% |
| 242 | 1 | 0.2% |
| 486 | 1 | 0.2% |
| 106 | 1 | 0.2% |
| 348 | 1 | 0.2% |
| Other values (436) | 436 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 14 | 1 | |
| 15 | 1 | |
| 17 | 1 | |
| 19 | 1 | |
| 20 | 1 |
| Value | Count | Frequency (%) |
| 2 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 1 | |
| 10 | 1 | |
| 12 | 1 | |
| 14 | 1 | |
| 18 | 1 |
| Value | Count | Frequency (%) |
| 2 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 1 | |
| 10 | 1 | |
| 12 | 1 | |
| 14 | 1 | |
| 18 | 1 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 14 | 1 | |
| 15 | 1 | |
| 17 | 1 | |
| 19 | 1 | |
| 20 | 1 |
Survived
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 2 | 2 |
| Distinct (%) | 0.4% | 0.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| 0 | |
|---|---|
| 1 |
| 0 | |
|---|---|
| 1 |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 446 | 446 |
| Distinct characters | 2 | 2 |
| Distinct categories | 1 | 1 ? |
| Distinct scripts | 1 | 1 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 0 | 1 |
| 2nd row | 1 | 0 |
| 3rd row | 0 | 0 |
| 4th row | 0 | 0 |
| 5th row | 1 | 0 |
Common Values
| Value | Count | Frequency (%) |
| 0 | 272 | |
| 1 | 174 |
| Value | Count | Frequency (%) |
| 0 | 260 | |
| 1 | 186 |
Length
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| 0 | 272 | |
| 1 | 174 |
| Value | Count | Frequency (%) |
| 0 | 260 | |
| 1 | 186 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 272 | |
| 1 | 174 |
| Value | Count | Frequency (%) |
| 0 | 260 | |
| 1 | 186 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 446 |
| Value | Count | Frequency (%) |
| Decimal Number | 446 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 272 | |
| 1 | 174 |
| Value | Count | Frequency (%) |
| 0 | 260 | |
| 1 | 186 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 446 |
| Value | Count | Frequency (%) |
| Common | 446 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 272 | |
| 1 | 174 |
| Value | Count | Frequency (%) |
| 0 | 260 | |
| 1 | 186 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 446 |
| Value | Count | Frequency (%) |
| ASCII | 446 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 272 | |
| 1 | 174 |
| Value | Count | Frequency (%) |
| 0 | 260 | |
| 1 | 186 |
Pclass
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 3 | 3 |
| Distinct (%) | 0.7% | 0.7% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| 3 | |
|---|---|
| 1 | |
| 2 |
| 3 | |
|---|---|
| 1 | |
| 2 |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 446 | 446 |
| Distinct characters | 3 | 3 |
| Distinct categories | 1 | 1 ? |
| Distinct scripts | 1 | 1 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 3 | 3 |
| 2nd row | 3 | 1 |
| 3rd row | 1 | 1 |
| 4th row | 1 | 3 |
| 5th row | 3 | 1 |
Common Values
| Value | Count | Frequency (%) |
| 3 | 247 | |
| 1 | 114 | |
| 2 | 85 | 19.1% |
| Value | Count | Frequency (%) |
| 3 | 229 | |
| 1 | 125 | |
| 2 | 92 |
Length
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| 3 | 247 | |
| 1 | 114 | |
| 2 | 85 | 19.1% |
| Value | Count | Frequency (%) |
| 3 | 229 | |
| 1 | 125 | |
| 2 | 92 |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 247 | |
| 1 | 114 | |
| 2 | 85 | 19.1% |
| Value | Count | Frequency (%) |
| 3 | 229 | |
| 1 | 125 | |
| 2 | 92 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 446 |
| Value | Count | Frequency (%) |
| Decimal Number | 446 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 3 | 247 | |
| 1 | 114 | |
| 2 | 85 | 19.1% |
| Value | Count | Frequency (%) |
| 3 | 229 | |
| 1 | 125 | |
| 2 | 92 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 446 |
| Value | Count | Frequency (%) |
| Common | 446 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 3 | 247 | |
| 1 | 114 | |
| 2 | 85 | 19.1% |
| Value | Count | Frequency (%) |
| 3 | 229 | |
| 1 | 125 | |
| 2 | 92 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 446 |
| Value | Count | Frequency (%) |
| ASCII | 446 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 3 | 247 | |
| 1 | 114 | |
| 2 | 85 | 19.1% |
| Value | Count | Frequency (%) |
| 3 | 229 | |
| 1 | 125 | |
| 2 | 92 |
Name
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 446 | 446 |
| Distinct (%) | 100.0% | 100.0% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| Panula, Master. Eino Viljami | 1 |
|---|---|
| Flynn, Mr. John Irwin ("Irving") | 1 |
| Cohen, Mr. Gurshon "Gus" | 1 |
| Fahlstrom, Mr. Arne Jonas | 1 |
| Landergren, Miss. Aurora Adelia | 1 |
| Other values (441) |
| Moubarek, Master. Gerios | 1 |
|---|---|
| Lovell, Mr. John Hall ("Henry") | 1 |
| Banfield, Mr. Frederick James | 1 |
| Turpin, Mrs. William John Robert (Dorothy Ann Wonnacott) | 1 |
| Weir, Col. John | 1 |
| Other values (441) |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 82 | 82 |
| Median length | 48 | 49.5 |
| Mean length | 26.840807 | 27.159193 |
| Min length | 13 | 13 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 11971 | 12113 |
| Distinct characters | 59 | 59 |
| Distinct categories | 7 | 7 ? |
| Distinct scripts | 2 | 2 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 446 | 446 ? |
| Unique (%) | 100.0% | 100.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | Panula, Master. Eino Viljami | Moubarek, Master. Gerios |
| 2nd row | Jalsevac, Mr. Ivan | Futrelle, Mr. Jacques Heath |
| 3rd row | Reuchlin, Jonkheer. John George | Silvey, Mr. William Baird |
| 4th row | Guggenheim, Mr. Benjamin | Pavlovic, Mr. Stefo |
| 5th row | Sunderland, Mr. Victor Francis | Thayer, Mr. John Borland |
Common Values
| Value | Count | Frequency (%) |
| Panula, Master. Eino Viljami | 1 | 0.2% |
| Flynn, Mr. John Irwin ("Irving") | 1 | 0.2% |
| Cohen, Mr. Gurshon "Gus" | 1 | 0.2% |
| Fahlstrom, Mr. Arne Jonas | 1 | 0.2% |
| Landergren, Miss. Aurora Adelia | 1 | 0.2% |
| Asplund, Master. Clarence Gustaf Hugo | 1 | 0.2% |
| Mellinger, Mrs. (Elizabeth Anne Maidment) | 1 | 0.2% |
| Sage, Mr. George John Jr | 1 | 0.2% |
| Richards, Mrs. Sidney (Emily Hocking) | 1 | 0.2% |
| Slabenoff, Mr. Petco | 1 | 0.2% |
| Other values (436) | 436 |
| Value | Count | Frequency (%) |
| Moubarek, Master. Gerios | 1 | 0.2% |
| Lovell, Mr. John Hall ("Henry") | 1 | 0.2% |
| Banfield, Mr. Frederick James | 1 | 0.2% |
| Turpin, Mrs. William John Robert (Dorothy Ann Wonnacott) | 1 | 0.2% |
| Weir, Col. John | 1 | 0.2% |
| Nicola-Yarred, Master. Elias | 1 | 0.2% |
| Murphy, Miss. Katherine "Kate" | 1 | 0.2% |
| Lefebre, Miss. Jeannie | 1 | 0.2% |
| Mionoff, Mr. Stoytcho | 1 | 0.2% |
| Davison, Mrs. Thomas Henry (Mary E Finck) | 1 | 0.2% |
| Other values (436) | 436 |
Length
Common Values (Plot)
Dataset A
Number of variable categories passes threshold (
config.plot.cat_freq.max_unique)Dataset B
Number of variable categories passes threshold (
config.plot.cat_freq.max_unique)| Value | Count | Frequency (%) |
| mr | 257 | 14.2% |
| miss | 99 | 5.5% |
| mrs | 62 | 3.4% |
| william | 32 | 1.8% |
| john | 19 | 1.1% |
| master | 17 | 0.9% |
| henry | 16 | 0.9% |
| frederick | 13 | 0.7% |
| charles | 13 | 0.7% |
| james | 13 | 0.7% |
| Other values (875) | 1263 |
| Value | Count | Frequency (%) |
| mr | 260 | 14.3% |
| miss | 94 | 5.2% |
| mrs | 63 | 3.5% |
| william | 31 | 1.7% |
| john | 22 | 1.2% |
| master | 21 | 1.2% |
| henry | 16 | 0.9% |
| thomas | 13 | 0.7% |
| george | 12 | 0.7% |
| charles | 11 | 0.6% |
| Other values (887) | 1280 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1358 | 11.3% | |
| r | 949 | 7.9% |
| e | 843 | 7.0% |
| a | 836 | 7.0% |
| n | 676 | 5.6% |
| s | 663 | 5.5% |
| i | 656 | 5.5% |
| M | 568 | 4.7% |
| o | 511 | 4.3% |
| l | 510 | 4.3% |
| Other values (49) | 4401 |
| Value | Count | Frequency (%) |
| 1377 | 11.4% | |
| r | 994 | 8.2% |
| e | 850 | 7.0% |
| a | 834 | 6.9% |
| i | 665 | 5.5% |
| s | 652 | 5.4% |
| n | 639 | 5.3% |
| M | 569 | 4.7% |
| l | 541 | 4.5% |
| o | 510 | 4.2% |
| Other values (49) | 4482 |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 7711 | |
| Uppercase Letter | 1811 | 15.1% |
| Space Separator | 1358 | 11.3% |
| Other Punctuation | 952 | 8.0% |
| Close Punctuation | 67 | 0.6% |
| Open Punctuation | 67 | 0.6% |
| Dash Punctuation | 5 | < 0.1% |
| Value | Count | Frequency (%) |
| Lowercase Letter | 7799 | |
| Uppercase Letter | 1831 | 15.1% |
| Space Separator | 1377 | 11.4% |
| Other Punctuation | 957 | 7.9% |
| Close Punctuation | 72 | 0.6% |
| Open Punctuation | 72 | 0.6% |
| Dash Punctuation | 5 | < 0.1% |
Most frequent character per category
Space Separator
| Value | Count | Frequency (%) |
| 1358 |
| Value | Count | Frequency (%) |
| 1377 |
Lowercase Letter
| Value | Count | Frequency (%) |
| r | 949 | |
| e | 843 | |
| a | 836 | |
| n | 676 | |
| s | 663 | |
| i | 656 | |
| o | 511 | 6.6% |
| l | 510 | 6.6% |
| t | 333 | 4.3% |
| h | 261 | 3.4% |
| Other values (16) | 1473 |
| Value | Count | Frequency (%) |
| r | 994 | |
| e | 850 | |
| a | 834 | |
| i | 665 | |
| s | 652 | |
| n | 639 | |
| l | 541 | 6.9% |
| o | 510 | 6.5% |
| t | 338 | 4.3% |
| h | 276 | 3.5% |
| Other values (16) | 1500 |
Uppercase Letter
| Value | Count | Frequency (%) |
| M | 568 | |
| A | 115 | 6.4% |
| J | 109 | 6.0% |
| C | 100 | 5.5% |
| S | 95 | 5.2% |
| H | 94 | 5.2% |
| E | 80 | 4.4% |
| L | 70 | 3.9% |
| W | 70 | 3.9% |
| R | 58 | 3.2% |
| Other values (15) | 452 |
| Value | Count | Frequency (%) |
| M | 569 | |
| A | 132 | 7.2% |
| H | 107 | 5.8% |
| J | 102 | 5.6% |
| C | 88 | 4.8% |
| E | 84 | 4.6% |
| S | 80 | 4.4% |
| B | 78 | 4.3% |
| W | 74 | 4.0% |
| G | 62 | 3.4% |
| Other values (15) | 455 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 447 | |
| , | 446 | |
| " | 54 | 5.7% |
| ' | 5 | 0.5% |
| Value | Count | Frequency (%) |
| , | 446 | |
| . | 446 | |
| " | 62 | 6.5% |
| ' | 3 | 0.3% |
Close Punctuation
| Value | Count | Frequency (%) |
| ) | 67 |
| Value | Count | Frequency (%) |
| ) | 72 |
Open Punctuation
| Value | Count | Frequency (%) |
| ( | 67 |
| Value | Count | Frequency (%) |
| ( | 72 |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 5 |
| Value | Count | Frequency (%) |
| - | 5 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 9522 | |
| Common | 2449 | 20.5% |
| Value | Count | Frequency (%) |
| Latin | 9630 | |
| Common | 2483 | 20.5% |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 1358 | ||
| . | 447 | 18.3% |
| , | 446 | 18.2% |
| ) | 67 | 2.7% |
| ( | 67 | 2.7% |
| " | 54 | 2.2% |
| - | 5 | 0.2% |
| ' | 5 | 0.2% |
| Value | Count | Frequency (%) |
| 1377 | ||
| , | 446 | 18.0% |
| . | 446 | 18.0% |
| ) | 72 | 2.9% |
| ( | 72 | 2.9% |
| " | 62 | 2.5% |
| - | 5 | 0.2% |
| ' | 3 | 0.1% |
Latin
| Value | Count | Frequency (%) |
| r | 949 | 10.0% |
| e | 843 | 8.9% |
| a | 836 | 8.8% |
| n | 676 | 7.1% |
| s | 663 | 7.0% |
| i | 656 | 6.9% |
| M | 568 | 6.0% |
| o | 511 | 5.4% |
| l | 510 | 5.4% |
| t | 333 | 3.5% |
| Other values (41) | 2977 |
| Value | Count | Frequency (%) |
| r | 994 | 10.3% |
| e | 850 | 8.8% |
| a | 834 | 8.7% |
| i | 665 | 6.9% |
| s | 652 | 6.8% |
| n | 639 | 6.6% |
| M | 569 | 5.9% |
| l | 541 | 5.6% |
| o | 510 | 5.3% |
| t | 338 | 3.5% |
| Other values (41) | 3038 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 11971 |
| Value | Count | Frequency (%) |
| ASCII | 12113 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 1358 | 11.3% | |
| r | 949 | 7.9% |
| e | 843 | 7.0% |
| a | 836 | 7.0% |
| n | 676 | 5.6% |
| s | 663 | 5.5% |
| i | 656 | 5.5% |
| M | 568 | 4.7% |
| o | 511 | 4.3% |
| l | 510 | 4.3% |
| Other values (49) | 4401 |
| Value | Count | Frequency (%) |
| 1377 | 11.4% | |
| r | 994 | 8.2% |
| e | 850 | 7.0% |
| a | 834 | 6.9% |
| i | 665 | 5.5% |
| s | 652 | 5.4% |
| n | 639 | 5.3% |
| M | 569 | 4.7% |
| l | 541 | 4.5% |
| o | 510 | 4.2% |
| Other values (49) | 4482 |
Sex
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 2 | 2 |
| Distinct (%) | 0.4% | 0.4% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| male | |
|---|---|
| female |
| male | |
|---|---|
| female |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 6 | 6 |
| Median length | 4 | 4 |
| Mean length | 4.735426 | 4.7040359 |
| Min length | 4 | 4 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 2112 | 2098 |
| Distinct characters | 5 | 5 |
| Distinct categories | 1 | 1 ? |
| Distinct scripts | 1 | 1 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | male | male |
| 2nd row | male | male |
| 3rd row | male | male |
| 4th row | male | male |
| 5th row | male | male |
Common Values
| Value | Count | Frequency (%) |
| male | 282 | |
| female | 164 |
| Value | Count | Frequency (%) |
| male | 289 | |
| female | 157 |
Length
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| male | 282 | |
| female | 164 |
| Value | Count | Frequency (%) |
| male | 289 | |
| female | 157 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 610 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 164 | 7.8% |
| Value | Count | Frequency (%) |
| e | 603 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 157 | 7.5% |
Most occurring categories
| Value | Count | Frequency (%) |
| Lowercase Letter | 2112 |
| Value | Count | Frequency (%) |
| Lowercase Letter | 2098 |
Most frequent character per category
Lowercase Letter
| Value | Count | Frequency (%) |
| e | 610 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 164 | 7.8% |
| Value | Count | Frequency (%) |
| e | 603 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 157 | 7.5% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 2112 |
| Value | Count | Frequency (%) |
| Latin | 2098 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| e | 610 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 164 | 7.8% |
| Value | Count | Frequency (%) |
| e | 603 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 157 | 7.5% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 2112 |
| Value | Count | Frequency (%) |
| ASCII | 2098 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| e | 610 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 164 | 7.8% |
| Value | Count | Frequency (%) |
| e | 603 | |
| m | 446 | |
| a | 446 | |
| l | 446 | |
| f | 157 | 7.5% |
Age
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 74 | 75 |
| Distinct (%) | 20.4% | 21.0% |
| Missing | 84 | 89 |
| Missing (%) | 18.8% | 20.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 29.562845 | 29.163165 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0.42 | 0.67 |
| Maximum | 80 | 80 |
| Zeros | 0 | 0 |
| Zeros (%) | 0.0% | 0.0% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0.42 | 0.67 |
| 5-th percentile | 3.05 | 4 |
| Q1 | 21 | 20 |
| median | 28 | 28 |
| Q3 | 37.75 | 36 |
| 95-th percentile | 54.95 | 54.4 |
| Maximum | 80 | 80 |
| Range | 79.58 | 79.33 |
| Interquartile range (IQR) | 16.75 | 16 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 14.498621 | 14.637636 |
| Coefficient of variation (CV) | 0.49043387 | 0.50192205 |
| Kurtosis | 0.23576311 | 0.24530319 |
| Mean | 29.562845 | 29.163165 |
| Median Absolute Deviation (MAD) | 8 | 8 |
| Skewness | 0.32900359 | 0.40175093 |
| Sum | 10701.75 | 10411.25 |
| Variance | 210.21 | 214.26038 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 28 | 17 | 3.8% |
| 24 | 15 | 3.4% |
| 22 | 14 | 3.1% |
| 18 | 14 | 3.1% |
| 27 | 14 | 3.1% |
| 36 | 14 | 3.1% |
| 21 | 13 | 2.9% |
| 19 | 11 | 2.5% |
| 35 | 10 | 2.2% |
| 29 | 10 | 2.2% |
| Other values (64) | 230 | |
| (Missing) | 84 | 18.8% |
| Value | Count | Frequency (%) |
| 22 | 17 | 3.8% |
| 24 | 16 | 3.6% |
| 35 | 12 | 2.7% |
| 18 | 12 | 2.7% |
| 21 | 12 | 2.7% |
| 36 | 12 | 2.7% |
| 34 | 11 | 2.5% |
| 29 | 11 | 2.5% |
| 28 | 11 | 2.5% |
| 30 | 11 | 2.5% |
| Other values (65) | 232 | |
| (Missing) | 89 | 20.0% |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.75 | 2 | 0.4% |
| 0.83 | 1 | 0.2% |
| 1 | 5 | |
| 2 | 7 | |
| 3 | 3 | |
| 4 | 3 | |
| 5 | 4 | |
| 6 | 1 | 0.2% |
| 7 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0.67 | 1 | 0.2% |
| 0.75 | 1 | 0.2% |
| 0.83 | 1 | 0.2% |
| 1 | 4 | |
| 2 | 6 | |
| 3 | 3 | |
| 4 | 6 | |
| 5 | 4 | |
| 6 | 1 | 0.2% |
| 7 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0.67 | 1 | 0.2% |
| 0.75 | 1 | 0.2% |
| 0.83 | 1 | 0.2% |
| 1 | 4 | |
| 2 | 6 | |
| 3 | 3 | |
| 4 | 6 | |
| 5 | 4 | |
| 6 | 1 | 0.2% |
| 7 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0.42 | 1 | 0.2% |
| 0.75 | 2 | 0.4% |
| 0.83 | 1 | 0.2% |
| 1 | 5 | |
| 2 | 7 | |
| 3 | 3 | |
| 4 | 3 | |
| 5 | 4 | |
| 6 | 1 | 0.2% |
| 7 | 1 | 0.2% |
SibSp
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 7 | 7 |
| Distinct (%) | 1.6% | 1.6% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 0.54035874 | 0.58071749 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 8 | 8 |
| Zeros | 307 | 289 |
| Zeros (%) | 68.8% | 64.8% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 0 | 0 |
| Q1 | 0 | 0 |
| median | 0 | 0 |
| Q3 | 1 | 1 |
| 95-th percentile | 3 | 3 |
| Maximum | 8 | 8 |
| Range | 8 | 8 |
| Interquartile range (IQR) | 1 | 1 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 1.1541557 | 1.1343405 |
| Coefficient of variation (CV) | 2.1359064 | 1.9533432 |
| Kurtosis | 16.653329 | 14.240695 |
| Mean | 0.54035874 | 0.58071749 |
| Median Absolute Deviation (MAD) | 0 | 0 |
| Skewness | 3.6055703 | 3.3066759 |
| Sum | 241 | 259 |
| Variance | 1.3320754 | 1.2867285 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 307 | |
| 1 | 98 | 22.0% |
| 2 | 15 | 3.4% |
| 3 | 10 | 2.2% |
| 4 | 9 | 2.0% |
| 8 | 4 | 0.9% |
| 5 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 289 | |
| 1 | 117 | |
| 2 | 12 | 2.7% |
| 4 | 11 | 2.5% |
| 3 | 10 | 2.2% |
| 5 | 4 | 0.9% |
| 8 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 307 | |
| 1 | 98 | 22.0% |
| 2 | 15 | 3.4% |
| 3 | 10 | 2.2% |
| 4 | 9 | 2.0% |
| 5 | 3 | 0.7% |
| 8 | 4 | 0.9% |
| Value | Count | Frequency (%) |
| 0 | 289 | |
| 1 | 117 | |
| 2 | 12 | 2.7% |
| 3 | 10 | 2.2% |
| 4 | 11 | 2.5% |
| 5 | 4 | 0.9% |
| 8 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 289 | |
| 1 | 117 | |
| 2 | 12 | 2.7% |
| 3 | 10 | 2.2% |
| 4 | 11 | 2.5% |
| 5 | 4 | 0.9% |
| 8 | 3 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 307 | |
| 1 | 98 | 22.0% |
| 2 | 15 | 3.4% |
| 3 | 10 | 2.2% |
| 4 | 9 | 2.0% |
| 5 | 3 | 0.7% |
| 8 | 4 | 0.9% |
Parch
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 7 | 7 |
| Distinct (%) | 1.6% | 1.6% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 0.34529148 | 0.44170404 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 6 | 6 |
| Zeros | 343 | 326 |
| Zeros (%) | 76.9% | 73.1% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 0 | 0 |
| Q1 | 0 | 0 |
| median | 0 | 0 |
| Q3 | 0 | 1 |
| 95-th percentile | 2 | 2 |
| Maximum | 6 | 6 |
| Range | 6 | 6 |
| Interquartile range (IQR) | 0 | 1 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 0.74173364 | 0.85326117 |
| Coefficient of variation (CV) | 2.1481377 | 1.9317486 |
| Kurtosis | 12.394247 | 7.9834361 |
| Mean | 0.34529148 | 0.44170404 |
| Median Absolute Deviation (MAD) | 0 | 0 |
| Skewness | 2.931901 | 2.439101 |
| Sum | 154 | 197 |
| Variance | 0.55016879 | 0.72805462 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 343 | |
| 1 | 63 | 14.1% |
| 2 | 35 | 7.8% |
| 3 | 2 | 0.4% |
| 6 | 1 | 0.2% |
| 4 | 1 | 0.2% |
| 5 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 326 | |
| 1 | 59 | 13.2% |
| 2 | 54 | 12.1% |
| 4 | 2 | 0.4% |
| 3 | 2 | 0.4% |
| 5 | 2 | 0.4% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 343 | |
| 1 | 63 | 14.1% |
| 2 | 35 | 7.8% |
| 3 | 2 | 0.4% |
| 4 | 1 | 0.2% |
| 5 | 1 | 0.2% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 326 | |
| 1 | 59 | 13.2% |
| 2 | 54 | 12.1% |
| 3 | 2 | 0.4% |
| 4 | 2 | 0.4% |
| 5 | 2 | 0.4% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 326 | |
| 1 | 59 | 13.2% |
| 2 | 54 | 12.1% |
| 3 | 2 | 0.4% |
| 4 | 2 | 0.4% |
| 5 | 2 | 0.4% |
| 6 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 343 | |
| 1 | 63 | 14.1% |
| 2 | 35 | 7.8% |
| 3 | 2 | 0.4% |
| 4 | 1 | 0.2% |
| 5 | 1 | 0.2% |
| 6 | 1 | 0.2% |
Ticket
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 375 | 375 |
| Distinct (%) | 84.1% | 84.1% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| CA 2144 | 4 |
|---|---|
| CA. 2343 | 4 |
| 1601 | 4 |
| LINE | 4 |
| 4133 | 3 |
| Other values (370) |
| CA 2144 | 5 |
|---|---|
| 17421 | 4 |
| 347082 | 4 |
| 347088 | 4 |
| 347077 | 4 |
| Other values (370) |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 18 | 18 |
| Median length | 17 | 17 |
| Mean length | 6.67713 | 6.6950673 |
| Min length | 3 | 3 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 2978 | 2986 |
| Distinct characters | 35 | 31 |
| Distinct categories | 5 | 5 ? |
| Distinct scripts | 2 | 2 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 322 | 323 ? |
| Unique (%) | 72.2% | 72.4% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | 3101295 | 2661 |
| 2nd row | 349240 | 113803 |
| 3rd row | 19972 | 13507 |
| 4th row | PC 17593 | 349242 |
| 5th row | SOTON/OQ 392089 | 17421 |
Common Values
| Value | Count | Frequency (%) |
| CA 2144 | 4 | 0.9% |
| CA. 2343 | 4 | 0.9% |
| 1601 | 4 | 0.9% |
| LINE | 4 | 0.9% |
| 4133 | 3 | 0.7% |
| S.O.C. 14879 | 3 | 0.7% |
| 113760 | 3 | 0.7% |
| PC 17760 | 3 | 0.7% |
| 347082 | 3 | 0.7% |
| 347088 | 3 | 0.7% |
| Other values (365) | 412 |
| Value | Count | Frequency (%) |
| CA 2144 | 5 | 1.1% |
| 17421 | 4 | 0.9% |
| 347082 | 4 | 0.9% |
| 347088 | 4 | 0.9% |
| 347077 | 4 | 0.9% |
| C.A. 34651 | 3 | 0.7% |
| CA. 2343 | 3 | 0.7% |
| 35273 | 3 | 0.7% |
| 347742 | 3 | 0.7% |
| PC 17755 | 3 | 0.7% |
| Other values (365) | 410 |
Length
Common Values (Plot)
Dataset A
Number of variable categories passes threshold (
config.plot.cat_freq.max_unique)Dataset B
Number of variable categories passes threshold (
config.plot.cat_freq.max_unique)| Value | Count | Frequency (%) |
| pc | 36 | 6.4% |
| c.a | 12 | 2.1% |
| ca | 9 | 1.6% |
| a/5 | 7 | 1.2% |
| ston/o | 6 | 1.1% |
| 2 | 6 | 1.1% |
| 2144 | 4 | 0.7% |
| soton/oq | 4 | 0.7% |
| line | 4 | 0.7% |
| 1601 | 4 | 0.7% |
| Other values (393) | 473 |
| Value | Count | Frequency (%) |
| pc | 34 | 6.0% |
| c.a | 14 | 2.5% |
| ca | 9 | 1.6% |
| a/5 | 8 | 1.4% |
| w./c | 6 | 1.1% |
| 2 | 6 | 1.1% |
| ston/o | 6 | 1.1% |
| sc/paris | 5 | 0.9% |
| 2144 | 5 | 0.9% |
| 17421 | 4 | 0.7% |
| Other values (393) | 468 |
Most occurring characters
| Value | Count | Frequency (%) |
| 3 | 373 | |
| 1 | 348 | |
| 2 | 289 | |
| 7 | 247 | |
| 4 | 235 | 7.9% |
| 6 | 228 | 7.7% |
| 0 | 213 | 7.2% |
| 5 | 178 | 6.0% |
| 9 | 150 | 5.0% |
| 8 | 126 | 4.2% |
| Other values (25) | 591 |
| Value | Count | Frequency (%) |
| 3 | 374 | |
| 1 | 337 | |
| 2 | 293 | |
| 7 | 246 | |
| 4 | 222 | 7.4% |
| 6 | 217 | 7.3% |
| 0 | 200 | 6.7% |
| 5 | 197 | 6.6% |
| 9 | 167 | 5.6% |
| 8 | 139 | 4.7% |
| Other values (21) | 594 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 2387 | |
| Uppercase Letter | 323 | 10.8% |
| Other Punctuation | 133 | 4.5% |
| Space Separator | 119 | 4.0% |
| Lowercase Letter | 16 | 0.5% |
| Value | Count | Frequency (%) |
| Decimal Number | 2392 | |
| Uppercase Letter | 323 | 10.8% |
| Other Punctuation | 144 | 4.8% |
| Space Separator | 119 | 4.0% |
| Lowercase Letter | 8 | 0.3% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 3 | 373 | |
| 1 | 348 | |
| 2 | 289 | |
| 7 | 247 | |
| 4 | 235 | |
| 6 | 228 | |
| 0 | 213 | |
| 5 | 178 | |
| 9 | 150 | |
| 8 | 126 | 5.3% |
| Value | Count | Frequency (%) |
| 3 | 374 | |
| 1 | 337 | |
| 2 | 293 | |
| 7 | 246 | |
| 4 | 222 | |
| 6 | 217 | |
| 0 | 200 | |
| 5 | 197 | |
| 9 | 167 | |
| 8 | 139 | 5.8% |
Space Separator
| Value | Count | Frequency (%) |
| 119 |
| Value | Count | Frequency (%) |
| 119 |
Other Punctuation
| Value | Count | Frequency (%) |
| . | 92 | |
| / | 41 |
| Value | Count | Frequency (%) |
| . | 97 | |
| / | 47 |
Uppercase Letter
| Value | Count | Frequency (%) |
| C | 81 | |
| P | 54 | |
| O | 46 | |
| A | 37 | |
| S | 31 | 9.6% |
| N | 21 | 6.5% |
| T | 17 | 5.3% |
| Q | 7 | 2.2% |
| E | 6 | 1.9% |
| I | 6 | 1.9% |
| Other values (6) | 17 | 5.3% |
| Value | Count | Frequency (%) |
| C | 80 | |
| P | 54 | |
| O | 48 | |
| A | 39 | |
| S | 33 | |
| N | 18 | 5.6% |
| T | 17 | 5.3% |
| W | 11 | 3.4% |
| Q | 8 | 2.5% |
| I | 4 | 1.2% |
| Other values (4) | 11 | 3.4% |
Lowercase Letter
| Value | Count | Frequency (%) |
| a | 4 | |
| s | 4 | |
| i | 3 | |
| r | 3 | |
| l | 1 | 6.2% |
| e | 1 | 6.2% |
| Value | Count | Frequency (%) |
| r | 2 | |
| i | 2 | |
| s | 2 | |
| a | 2 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 2639 | |
| Latin | 339 | 11.4% |
| Value | Count | Frequency (%) |
| Common | 2655 | |
| Latin | 331 | 11.1% |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 3 | 373 | |
| 1 | 348 | |
| 2 | 289 | |
| 7 | 247 | |
| 4 | 235 | |
| 6 | 228 | |
| 0 | 213 | |
| 5 | 178 | |
| 9 | 150 | |
| 8 | 126 | 4.8% |
| Other values (3) | 252 |
| Value | Count | Frequency (%) |
| 3 | 374 | |
| 1 | 337 | |
| 2 | 293 | |
| 7 | 246 | |
| 4 | 222 | |
| 6 | 217 | |
| 0 | 200 | |
| 5 | 197 | |
| 9 | 167 | |
| 8 | 139 | 5.2% |
| Other values (3) | 263 |
Latin
| Value | Count | Frequency (%) |
| C | 81 | |
| P | 54 | |
| O | 46 | |
| A | 37 | |
| S | 31 | 9.1% |
| N | 21 | 6.2% |
| T | 17 | 5.0% |
| Q | 7 | 2.1% |
| E | 6 | 1.8% |
| I | 6 | 1.8% |
| Other values (12) | 33 |
| Value | Count | Frequency (%) |
| C | 80 | |
| P | 54 | |
| O | 48 | |
| A | 39 | |
| S | 33 | |
| N | 18 | 5.4% |
| T | 17 | 5.1% |
| W | 11 | 3.3% |
| Q | 8 | 2.4% |
| I | 4 | 1.2% |
| Other values (8) | 19 | 5.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 2978 |
| Value | Count | Frequency (%) |
| ASCII | 2986 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 3 | 373 | |
| 1 | 348 | |
| 2 | 289 | |
| 7 | 247 | |
| 4 | 235 | 7.9% |
| 6 | 228 | 7.7% |
| 0 | 213 | 7.2% |
| 5 | 178 | 6.0% |
| 9 | 150 | 5.0% |
| 8 | 126 | 4.2% |
| Other values (25) | 591 |
| Value | Count | Frequency (%) |
| 3 | 374 | |
| 1 | 337 | |
| 2 | 293 | |
| 7 | 246 | |
| 4 | 222 | 7.4% |
| 6 | 217 | 7.3% |
| 0 | 200 | 6.7% |
| 5 | 197 | 6.6% |
| 9 | 167 | 5.6% |
| 8 | 139 | 4.7% |
| Other values (21) | 594 |
Fare
Real number (ℝ)
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 172 | 177 |
| Distinct (%) | 38.6% | 39.7% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Infinite | 0 | 0 |
| Infinite (%) | 0.0% | 0.0% |
| Mean | 32.777746 | 35.344403 |
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| Maximum | 512.3292 | 512.3292 |
| Zeros | 9 | 10 |
| Zeros (%) | 2.0% | 2.2% |
| Negative | 0 | 0 |
| Negative (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
Quantile statistics
| Dataset A | Dataset B | |
|---|---|---|
| Minimum | 0 | 0 |
| 5-th percentile | 7.225 | 7.225 |
| Q1 | 7.8958 | 8.05 |
| median | 14.4542 | 15.975 |
| Q3 | 31.275 | 32.875 |
| 95-th percentile | 110.8833 | 112.67708 |
| Maximum | 512.3292 | 512.3292 |
| Range | 512.3292 | 512.3292 |
| Interquartile range (IQR) | 23.3792 | 24.825 |
Descriptive statistics
| Dataset A | Dataset B | |
|---|---|---|
| Standard deviation | 51.480352 | 56.871171 |
| Coefficient of variation (CV) | 1.5705886 | 1.6090573 |
| Kurtosis | 36.46659 | 34.669628 |
| Mean | 32.777746 | 35.344403 |
| Median Absolute Deviation (MAD) | 6.9584 | 8.7458 |
| Skewness | 5.0416265 | 5.0857181 |
| Sum | 14618.874 | 15763.604 |
| Variance | 2650.2267 | 3234.3301 |
| Monotonicity | Not monotonic | Not monotonic |
| Value | Count | Frequency (%) |
| 7.75 | 22 | 4.9% |
| 13 | 21 | 4.7% |
| 8.05 | 19 | 4.3% |
| 7.8958 | 17 | 3.8% |
| 26 | 14 | 3.1% |
| 10.5 | 12 | 2.7% |
| 7.925 | 10 | 2.2% |
| 26.55 | 10 | 2.2% |
| 0 | 9 | 2.0% |
| 7.8542 | 9 | 2.0% |
| Other values (162) | 303 |
| Value | Count | Frequency (%) |
| 8.05 | 25 | 5.6% |
| 13 | 17 | 3.8% |
| 7.75 | 16 | 3.6% |
| 7.8958 | 15 | 3.4% |
| 26 | 15 | 3.4% |
| 10.5 | 12 | 2.7% |
| 26.55 | 11 | 2.5% |
| 0 | 10 | 2.2% |
| 7.2292 | 10 | 2.2% |
| 7.925 | 7 | 1.6% |
| Other values (167) | 308 |
| Value | Count | Frequency (%) |
| 0 | 9 | |
| 5 | 1 | 0.2% |
| 6.2375 | 1 | 0.2% |
| 6.4375 | 1 | 0.2% |
| 6.4958 | 2 | 0.4% |
| 6.75 | 2 | 0.4% |
| 6.95 | 1 | 0.2% |
| 7.0458 | 1 | 0.2% |
| 7.05 | 3 | 0.7% |
| 7.125 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 10 | |
| 5 | 1 | 0.2% |
| 6.4958 | 2 | 0.4% |
| 6.75 | 1 | 0.2% |
| 6.8583 | 1 | 0.2% |
| 6.95 | 1 | 0.2% |
| 7.0458 | 1 | 0.2% |
| 7.05 | 3 | 0.7% |
| 7.0542 | 1 | 0.2% |
| 7.125 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 10 | |
| 5 | 1 | 0.2% |
| 6.4958 | 2 | 0.4% |
| 6.75 | 1 | 0.2% |
| 6.8583 | 1 | 0.2% |
| 6.95 | 1 | 0.2% |
| 7.0458 | 1 | 0.2% |
| 7.05 | 3 | 0.7% |
| 7.0542 | 1 | 0.2% |
| 7.125 | 1 | 0.2% |
| Value | Count | Frequency (%) |
| 0 | 9 | |
| 5 | 1 | 0.2% |
| 6.2375 | 1 | 0.2% |
| 6.4375 | 1 | 0.2% |
| 6.4958 | 2 | 0.4% |
| 6.75 | 2 | 0.4% |
| 6.95 | 1 | 0.2% |
| 7.0458 | 1 | 0.2% |
| 7.05 | 3 | 0.7% |
| 7.125 | 1 | 0.2% |
Cabin
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 82 | 93 |
| Distinct (%) | 82.8% | 83.0% |
| Missing | 347 | 334 |
| Missing (%) | 77.8% | 74.9% |
| Memory size | 7.0 KiB | 7.0 KiB |
| B96 B98 | 3 |
|---|---|
| B18 | 2 |
| D | 2 |
| E101 | 2 |
| C65 | 2 |
| Other values (77) |
| B96 B98 | 3 |
|---|---|
| F33 | 3 |
| C23 C25 C27 | 3 |
| D36 | 2 |
| C123 | 2 |
| Other values (88) |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 11 | 15 |
| Median length | 3 | 3 |
| Mean length | 3.4646465 | 3.75 |
| Min length | 1 | 2 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 343 | 420 |
| Distinct characters | 18 | 18 |
| Distinct categories | 3 | 3 ? |
| Distinct scripts | 2 | 2 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 66 | 77 ? |
| Unique (%) | 66.7% | 68.8% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | B82 B84 | C123 |
| 2nd row | C78 | E44 |
| 3rd row | B3 | C68 |
| 4th row | C124 | C126 |
| 5th row | D6 | B49 |
Common Values
| Value | Count | Frequency (%) |
| B96 B98 | 3 | 0.7% |
| B18 | 2 | 0.4% |
| D | 2 | 0.4% |
| E101 | 2 | 0.4% |
| C65 | 2 | 0.4% |
| B51 B53 B55 | 2 | 0.4% |
| D36 | 2 | 0.4% |
| C68 | 2 | 0.4% |
| E8 | 2 | 0.4% |
| C93 | 2 | 0.4% |
| Other values (72) | 78 | 17.5% |
| (Missing) | 347 |
| Value | Count | Frequency (%) |
| B96 B98 | 3 | 0.7% |
| F33 | 3 | 0.7% |
| C23 C25 C27 | 3 | 0.7% |
| D36 | 2 | 0.4% |
| C123 | 2 | 0.4% |
| C93 | 2 | 0.4% |
| E67 | 2 | 0.4% |
| C65 | 2 | 0.4% |
| C83 | 2 | 0.4% |
| F2 | 2 | 0.4% |
| Other values (83) | 89 | 20.0% |
| (Missing) | 334 |
Length
Common Values (Plot)
Dataset A
Number of variable categories passes threshold (
config.plot.cat_freq.max_unique)Dataset B
Number of variable categories passes threshold (
config.plot.cat_freq.max_unique)| Value | Count | Frequency (%) |
| b96 | 3 | 2.7% |
| b98 | 3 | 2.7% |
| e8 | 2 | 1.8% |
| b5 | 2 | 1.8% |
| e24 | 2 | 1.8% |
| b35 | 2 | 1.8% |
| c27 | 2 | 1.8% |
| c25 | 2 | 1.8% |
| c23 | 2 | 1.8% |
| d20 | 2 | 1.8% |
| Other values (80) | 91 |
| Value | Count | Frequency (%) |
| b96 | 3 | 2.2% |
| f33 | 3 | 2.2% |
| c23 | 3 | 2.2% |
| c25 | 3 | 2.2% |
| c27 | 3 | 2.2% |
| b98 | 3 | 2.2% |
| b51 | 2 | 1.5% |
| f | 2 | 1.5% |
| f4 | 2 | 1.5% |
| e8 | 2 | 1.5% |
| Other values (96) | 108 |
Most occurring characters
| Value | Count | Frequency (%) |
| B | 32 | 9.3% |
| 2 | 30 | 8.7% |
| C | 30 | 8.7% |
| 3 | 28 | 8.2% |
| 1 | 27 | 7.9% |
| 5 | 26 | 7.6% |
| 6 | 24 | 7.0% |
| 8 | 19 | 5.5% |
| D | 19 | 5.5% |
| 9 | 19 | 5.5% |
| Other values (8) | 89 |
| Value | Count | Frequency (%) |
| C | 43 | |
| 3 | 39 | 9.3% |
| 2 | 39 | 9.3% |
| B | 36 | 8.6% |
| 1 | 33 | 7.9% |
| 6 | 32 | 7.6% |
| 5 | 25 | 6.0% |
| 8 | 23 | 5.5% |
| 22 | 5.2% | |
| 4 | 22 | 5.2% |
| Other values (8) | 106 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 216 | |
| Uppercase Letter | 113 | |
| Space Separator | 14 | 4.1% |
| Value | Count | Frequency (%) |
| Decimal Number | 264 | |
| Uppercase Letter | 134 | |
| Space Separator | 22 | 5.2% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| B | 32 | |
| C | 30 | |
| D | 19 | |
| E | 18 | |
| A | 9 | 8.0% |
| F | 4 | 3.5% |
| G | 1 | 0.9% |
| Value | Count | Frequency (%) |
| C | 43 | |
| B | 36 | |
| E | 19 | |
| D | 18 | |
| F | 9 | 6.7% |
| A | 6 | 4.5% |
| G | 3 | 2.2% |
Decimal Number
| Value | Count | Frequency (%) |
| 2 | 30 | |
| 3 | 28 | |
| 1 | 27 | |
| 5 | 26 | |
| 6 | 24 | |
| 8 | 19 | |
| 9 | 19 | |
| 4 | 18 | |
| 0 | 13 | |
| 7 | 12 | 5.6% |
| Value | Count | Frequency (%) |
| 3 | 39 | |
| 2 | 39 | |
| 1 | 33 | |
| 6 | 32 | |
| 5 | 25 | |
| 8 | 23 | |
| 4 | 22 | |
| 9 | 19 | |
| 7 | 16 | |
| 0 | 16 |
Space Separator
| Value | Count | Frequency (%) |
| 14 |
| Value | Count | Frequency (%) |
| 22 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 230 | |
| Latin | 113 |
| Value | Count | Frequency (%) |
| Common | 286 | |
| Latin | 134 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| B | 32 | |
| C | 30 | |
| D | 19 | |
| E | 18 | |
| A | 9 | 8.0% |
| F | 4 | 3.5% |
| G | 1 | 0.9% |
| Value | Count | Frequency (%) |
| C | 43 | |
| B | 36 | |
| E | 19 | |
| D | 18 | |
| F | 9 | 6.7% |
| A | 6 | 4.5% |
| G | 3 | 2.2% |
Common
| Value | Count | Frequency (%) |
| 2 | 30 | |
| 3 | 28 | |
| 1 | 27 | |
| 5 | 26 | |
| 6 | 24 | |
| 8 | 19 | |
| 9 | 19 | |
| 4 | 18 | |
| 14 | ||
| 0 | 13 |
| Value | Count | Frequency (%) |
| 3 | 39 | |
| 2 | 39 | |
| 1 | 33 | |
| 6 | 32 | |
| 5 | 25 | |
| 8 | 23 | |
| 22 | ||
| 4 | 22 | |
| 9 | 19 | |
| 7 | 16 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 343 |
| Value | Count | Frequency (%) |
| ASCII | 420 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| B | 32 | 9.3% |
| 2 | 30 | 8.7% |
| C | 30 | 8.7% |
| 3 | 28 | 8.2% |
| 1 | 27 | 7.9% |
| 5 | 26 | 7.6% |
| 6 | 24 | 7.0% |
| 8 | 19 | 5.5% |
| D | 19 | 5.5% |
| 9 | 19 | 5.5% |
| Other values (8) | 89 |
| Value | Count | Frequency (%) |
| C | 43 | |
| 3 | 39 | 9.3% |
| 2 | 39 | 9.3% |
| B | 36 | 8.6% |
| 1 | 33 | 7.9% |
| 6 | 32 | 7.6% |
| 5 | 25 | 6.0% |
| 8 | 23 | 5.5% |
| 22 | 5.2% | |
| 4 | 22 | 5.2% |
| Other values (8) | 106 |
Embarked
Categorical
| Dataset A | Dataset B | |
|---|---|---|
| Distinct | 3 | 3 |
| Distinct (%) | 0.7% | 0.7% |
| Missing | 0 | 0 |
| Missing (%) | 0.0% | 0.0% |
| Memory size | 7.0 KiB | 7.0 KiB |
| S | |
|---|---|
| C | |
| Q |
| S | |
|---|---|
| C | |
| Q |
Length
| Dataset A | Dataset B | |
|---|---|---|
| Max length | 1 | 1 |
| Median length | 1 | 1 |
| Mean length | 1 | 1 |
| Min length | 1 | 1 |
Characters and Unicode
| Dataset A | Dataset B | |
|---|---|---|
| Total characters | 446 | 446 |
| Distinct characters | 3 | 3 |
| Distinct categories | 1 | 1 ? |
| Distinct scripts | 1 | 1 ? |
| Distinct blocks | 1 | 1 ? |
Unique
| Dataset A | Dataset B | |
|---|---|---|
| Unique | 0 | 0 ? |
| Unique (%) | 0.0% | 0.0% |
Sample
| Dataset A | Dataset B | |
|---|---|---|
| 1st row | S | C |
| 2nd row | C | S |
| 3rd row | S | S |
| 4th row | C | S |
| 5th row | S | C |
Common Values
| Value | Count | Frequency (%) |
| S | 317 | |
| C | 86 | 19.3% |
| Q | 43 | 9.6% |
| Value | Count | Frequency (%) |
| S | 312 | |
| C | 98 | 22.0% |
| Q | 36 | 8.1% |
Length
Common Values (Plot)
Dataset A
Dataset B
| Value | Count | Frequency (%) |
| s | 317 | |
| c | 86 | 19.3% |
| q | 43 | 9.6% |
| Value | Count | Frequency (%) |
| s | 312 | |
| c | 98 | 22.0% |
| q | 36 | 8.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| S | 317 | |
| C | 86 | 19.3% |
| Q | 43 | 9.6% |
| Value | Count | Frequency (%) |
| S | 312 | |
| C | 98 | 22.0% |
| Q | 36 | 8.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 446 |
| Value | Count | Frequency (%) |
| Uppercase Letter | 446 |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| S | 317 | |
| C | 86 | 19.3% |
| Q | 43 | 9.6% |
| Value | Count | Frequency (%) |
| S | 312 | |
| C | 98 | 22.0% |
| Q | 36 | 8.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 446 |
| Value | Count | Frequency (%) |
| Latin | 446 |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| S | 317 | |
| C | 86 | 19.3% |
| Q | 43 | 9.6% |
| Value | Count | Frequency (%) |
| S | 312 | |
| C | 98 | 22.0% |
| Q | 36 | 8.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 446 |
| Value | Count | Frequency (%) |
| ASCII | 446 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| S | 317 | |
| C | 86 | 19.3% |
| Q | 43 | 9.6% |
| Value | Count | Frequency (%) |
| S | 312 | |
| C | 98 | 22.0% |
| Q | 36 | 8.1% |
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
| PassengerId | Age | SibSp | Parch | Fare | Survived | Pclass | Sex | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|
| PassengerId | 1.000 | -0.027 | -0.065 | 0.031 | 0.023 | 0.067 | 0.000 | 0.000 | 0.099 | 0.000 |
| Age | -0.027 | 1.000 | -0.213 | -0.273 | 0.162 | 0.160 | 0.311 | 0.149 | 0.280 | 0.182 |
| SibSp | -0.065 | -0.213 | 1.000 | 0.465 | 0.407 | 0.183 | 0.121 | 0.227 | 0.423 | 0.093 |
| Parch | 0.031 | -0.273 | 0.465 | 1.000 | 0.402 | 0.129 | 0.000 | 0.254 | 0.383 | 0.078 |
| Fare | 0.023 | 0.162 | 0.407 | 0.402 | 1.000 | 0.266 | 0.459 | 0.174 | 0.273 | 0.192 |
| Survived | 0.067 | 0.160 | 0.183 | 0.129 | 0.266 | 1.000 | 0.308 | 0.566 | 0.242 | 0.109 |
| Pclass | 0.000 | 0.311 | 0.121 | 0.000 | 0.459 | 0.308 | 1.000 | 0.105 | 0.421 | 0.262 |
| Sex | 0.000 | 0.149 | 0.227 | 0.254 | 0.174 | 0.566 | 0.105 | 1.000 | 0.160 | 0.098 |
| Cabin | 0.099 | 0.280 | 0.423 | 0.383 | 0.273 | 0.242 | 0.421 | 0.160 | 1.000 | 0.407 |
| Embarked | 0.000 | 0.182 | 0.093 | 0.078 | 0.192 | 0.109 | 0.262 | 0.098 | 0.407 | 1.000 |
Dataset B
| PassengerId | Age | SibSp | Parch | Fare | Survived | Pclass | Sex | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|
| PassengerId | 1.000 | 0.092 | -0.029 | -0.020 | -0.027 | 0.014 | 0.000 | 0.038 | 0.022 | 0.000 |
| Age | 0.092 | 1.000 | -0.238 | -0.273 | 0.130 | 0.075 | 0.275 | 0.112 | 0.288 | 0.081 |
| SibSp | -0.029 | -0.238 | 1.000 | 0.488 | 0.466 | 0.208 | 0.152 | 0.215 | 0.411 | 0.114 |
| Parch | -0.020 | -0.273 | 0.488 | 1.000 | 0.388 | 0.083 | 0.000 | 0.257 | 0.355 | 0.032 |
| Fare | -0.027 | 0.130 | 0.466 | 0.388 | 1.000 | 0.273 | 0.472 | 0.186 | 0.353 | 0.229 |
| Survived | 0.014 | 0.075 | 0.208 | 0.083 | 0.273 | 1.000 | 0.330 | 0.532 | 0.096 | 0.176 |
| Pclass | 0.000 | 0.275 | 0.152 | 0.000 | 0.472 | 0.330 | 1.000 | 0.131 | 0.418 | 0.263 |
| Sex | 0.038 | 0.112 | 0.215 | 0.257 | 0.186 | 0.532 | 0.131 | 1.000 | 0.000 | 0.094 |
| Cabin | 0.022 | 0.288 | 0.411 | 0.355 | 0.353 | 0.096 | 0.418 | 0.000 | 1.000 | 0.405 |
| Embarked | 0.000 | 0.081 | 0.114 | 0.032 | 0.229 | 0.176 | 0.263 | 0.094 | 0.405 | 1.000 |
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
Dataset B
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 164 | 165 | 0 | 3 | Panula, Master. Eino Viljami | male | 1.0 | 4 | 1 | 3101295 | 39.6875 | NaN | S |
| 455 | 456 | 1 | 3 | Jalsevac, Mr. Ivan | male | 29.0 | 0 | 0 | 349240 | 7.8958 | NaN | C |
| 822 | 823 | 0 | 1 | Reuchlin, Jonkheer. John George | male | 38.0 | 0 | 0 | 19972 | 0.0000 | NaN | S |
| 789 | 790 | 0 | 1 | Guggenheim, Mr. Benjamin | male | 46.0 | 0 | 0 | PC 17593 | 79.2000 | B82 B84 | C |
| 220 | 221 | 1 | 3 | Sunderland, Mr. Victor Francis | male | 16.0 | 0 | 0 | SOTON/OQ 392089 | 8.0500 | NaN | S |
| 412 | 413 | 1 | 1 | Minahan, Miss. Daisy E | female | 33.0 | 1 | 0 | 19928 | 90.0000 | C78 | Q |
| 779 | 780 | 1 | 1 | Robert, Mrs. Edward Scott (Elisabeth Walton McMillan) | female | 43.0 | 0 | 1 | 24160 | 211.3375 | B3 | S |
| 497 | 498 | 0 | 3 | Shellard, Mr. Frederick William | male | NaN | 0 | 0 | C.A. 6212 | 15.1000 | NaN | S |
| 711 | 712 | 0 | 1 | Klaber, Mr. Herman | male | NaN | 0 | 0 | 113028 | 26.5500 | C124 | S |
| 254 | 255 | 0 | 3 | Rosblom, Mrs. Viktor (Helena Wilhelmina) | female | 41.0 | 0 | 2 | 370129 | 20.2125 | NaN | S |
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 65 | 66 | 1 | 3 | Moubarek, Master. Gerios | male | NaN | 1 | 1 | 2661 | 15.2458 | NaN | C |
| 137 | 138 | 0 | 1 | Futrelle, Mr. Jacques Heath | male | 37.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
| 434 | 435 | 0 | 1 | Silvey, Mr. William Baird | male | 50.0 | 1 | 0 | 13507 | 55.9000 | E44 | S |
| 519 | 520 | 0 | 3 | Pavlovic, Mr. Stefo | male | 32.0 | 0 | 0 | 349242 | 7.8958 | NaN | S |
| 698 | 699 | 0 | 1 | Thayer, Mr. John Borland | male | 49.0 | 1 | 1 | 17421 | 110.8833 | C68 | C |
| 490 | 491 | 0 | 3 | Hagland, Mr. Konrad Mathias Reiersen | male | NaN | 1 | 0 | 65304 | 19.9667 | NaN | S |
| 712 | 713 | 1 | 1 | Taylor, Mr. Elmer Zebley | male | 48.0 | 1 | 0 | 19996 | 52.0000 | C126 | S |
| 563 | 564 | 0 | 3 | Simmons, Mr. John | male | NaN | 0 | 0 | SOTON/OQ 392082 | 8.0500 | NaN | S |
| 291 | 292 | 1 | 1 | Bishop, Mrs. Dickinson H (Helen Walton) | female | 19.0 | 1 | 0 | 11967 | 91.0792 | B49 | C |
| 689 | 690 | 1 | 1 | Madill, Miss. Georgette Alexandra | female | 15.0 | 0 | 1 | 24160 | 211.3375 | B5 | S |
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 393 | 394 | 1 | 1 | Newell, Miss. Marjorie | female | 23.0 | 1 | 0 | 35273 | 113.2750 | D36 | C |
| 724 | 725 | 1 | 1 | Chambers, Mr. Norman Campbell | male | 27.0 | 1 | 0 | 113806 | 53.1000 | E8 | S |
| 629 | 630 | 0 | 3 | O'Connell, Mr. Patrick D | male | NaN | 0 | 0 | 334912 | 7.7333 | NaN | Q |
| 121 | 122 | 0 | 3 | Moore, Mr. Leonard Charles | male | NaN | 0 | 0 | A4. 54510 | 8.0500 | NaN | S |
| 374 | 375 | 0 | 3 | Palsson, Miss. Stina Viola | female | 3.0 | 3 | 1 | 349909 | 21.0750 | NaN | S |
| 725 | 726 | 0 | 3 | Oreskovic, Mr. Luka | male | 20.0 | 0 | 0 | 315094 | 8.6625 | NaN | S |
| 79 | 80 | 1 | 3 | Dowdell, Miss. Elizabeth | female | 30.0 | 0 | 0 | 364516 | 12.4750 | NaN | S |
| 241 | 242 | 1 | 3 | Murphy, Miss. Katherine "Kate" | female | NaN | 1 | 0 | 367230 | 15.5000 | NaN | Q |
| 647 | 648 | 1 | 1 | Simonius-Blumer, Col. Oberst Alfons | male | 56.0 | 0 | 0 | 13213 | 35.5000 | A26 | C |
| 766 | 767 | 0 | 1 | Brewe, Dr. Arthur Jackson | male | NaN | 0 | 0 | 112379 | 39.6000 | NaN | C |
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 763 | 764 | 1 | 1 | Carter, Mrs. William Ernest (Lucile Polk) | female | 36.0 | 1 | 2 | 113760 | 120.0000 | B96 B98 | S |
| 509 | 510 | 1 | 3 | Lang, Mr. Fang | male | 26.0 | 0 | 0 | 1601 | 56.4958 | NaN | S |
| 646 | 647 | 0 | 3 | Cor, Mr. Liudevit | male | 19.0 | 0 | 0 | 349231 | 7.8958 | NaN | S |
| 475 | 476 | 0 | 1 | Clifford, Mr. George Quincy | male | NaN | 0 | 0 | 110465 | 52.0000 | A14 | S |
| 146 | 147 | 1 | 3 | Andersson, Mr. August Edvard ("Wennerstrom") | male | 27.0 | 0 | 0 | 350043 | 7.7958 | NaN | S |
| 667 | 668 | 0 | 3 | Rommetvedt, Mr. Knud Paust | male | NaN | 0 | 0 | 312993 | 7.7750 | NaN | S |
| 765 | 766 | 1 | 1 | Hogeboom, Mrs. John C (Anna Andrews) | female | 51.0 | 1 | 0 | 13502 | 77.9583 | D11 | S |
| 661 | 662 | 0 | 3 | Badt, Mr. Mohamed | male | 40.0 | 0 | 0 | 2623 | 7.2250 | NaN | C |
| 447 | 448 | 1 | 1 | Seward, Mr. Frederic Kimber | male | 34.0 | 0 | 0 | 113794 | 26.5500 | NaN | S |
| 725 | 726 | 0 | 3 | Oreskovic, Mr. Luka | male | 20.0 | 0 | 0 | 315094 | 8.6625 | NaN | S |
Dataset A
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset does not contain duplicate rows. | |||||||||||||
Dataset B
| PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset does not contain duplicate rows. | |||||||||||||